Goto

Collaborating Authors

 white patient


Equitable Survival Prediction: A Fairness-Aware Survival Modeling (FASM) Approach

Liu, Mingxuan, Ning, Yilin, Wang, Haoyuan, Hong, Chuan, Engelhard, Matthew, Bitterman, Danielle S., La Cava, William G., Liu, Nan

arXiv.org Artificial Intelligence

As machine learning models become increasingly integrated into healthcare, structural inequities and social biases embedded in clinical data can be perpetuated or even amplified by data-driven models. In survival analysis, censoring and time dynamics can further add complexity to fair model development. Additionally, algorithmic fairness approaches often overlook disparities in cross-group rankings, e.g., high-risk Black patients may be ranked below lower-risk White patients who do not experience the event of mortality. Such misranking can reinforce biological essentialism and undermine equitable care. We propose a Fairness-Aware Survival Modeling (FASM), designed to mitigate algorithmic bias regarding both intra-group and cross-group risk rankings over time. Using breast cancer prognosis as a representative case and applying FASM to SEER breast cancer data, we show that FASM substantially improves fairness while preserving discrimination performance comparable to fairness-unaware survival models. Time-stratified evaluations show that FASM maintains stable fairness over a 10-year horizon, with the greatest improvements observed during the mid-term of follow-up. Our approach enables the development of survival models that prioritize both accuracy and equity in clinical decision-making, advancing fairness as a core principle in clinical care.


Learning Disease Progression Models That Capture Health Disparities

Chiang, Erica, Shanmugam, Divya, Beecy, Ashley N., Sayer, Gabriel, Uriel, Nir, Estrin, Deborah, Garg, Nikhil, Pierson, Emma

arXiv.org Machine Learning

Disease progression models are widely used to inform the diagnosis and treatment of many progressive diseases. However, a significant limitation of existing models is that they do not account for health disparities that can bias the observed data. To address this, we develop an interpretable Bayesian disease progression model that captures three key health disparities: certain patient populations may (1) start receiving care only when their disease is more severe, (2) experience faster disease progression even while receiving care, or (3) receive follow-up care less frequently conditional on disease severity. We show theoretically and empirically that failing to account for disparities produces biased estimates of severity (underestimating severity for disadvantaged groups, for example). On a dataset of heart failure patients, we show that our model can identify groups that face each type of health disparity, and that accounting for these disparities meaningfully shifts which patients are considered high-risk.


A Data Envelopment Analysis Approach for Assessing Fairness in Resource Allocation: Application to Kidney Exchange Programs

Kaazempur-Mofrad, Ali, Dai, Xiaowu

arXiv.org Artificial Intelligence

Kidney exchange programs have significantly increased transplantation rates but raise pressing questions about fairness in organ allocation. We present a novel framework leveraging Data Envelopment Analysis (DEA) to evaluate multiple fairness criteria--Priority, Access, and Outcome--within a single model, capturing complexities that may be overlooked in single-metric analyses. Using data from the United Network for Organ Sharing, we analyze these criteria individually, measuring Priority fairness through waitlist durations, Access fairness through Kidney Donor Profile Index scores, and Outcome fairness through graft lifespan. We then apply our DEA model to demonstrate significant disparities in kidney allocation efficiency across ethnic groups. To quantify uncertainty, we employ conformal prediction within the DEA framework, yielding group-conditional prediction intervals with finite sample coverage guarantees. Our findings show notable differences in efficiency distributions between ethnic groups. Our study provides a rigorous framework for evaluating fairness in complex resource allocation systems, where resource scarcity and mutual compatibility constraints exist. All code for using the proposed method and reproducing results is available on GitHub.


Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation

Yang, Yifan, Liu, Xiaoyu, Jin, Qiao, Huang, Furong, Lu, Zhiyong

arXiv.org Artificial Intelligence

Large language models like GPT-3.5-turbo and GPT-4 hold promise for healthcare professionals, but they may inadvertently inherit biases during their training, potentially affecting their utility in medical applications. Despite few attempts in the past, the precise impact and extent of these biases remain uncertain. Through both qualitative and quantitative analyses, we find that these models tend to project higher costs and longer hospitalizations for White populations and exhibit optimistic views in challenging medical scenarios with much higher survival rates. These biases, which mirror real-world healthcare disparities, are evident in the generation of patient backgrounds, the association of specific diseases with certain races, and disparities in treatment recommendations, etc. Our findings underscore the critical need for future research to address and mitigate biases in language models, especially in critical healthcare applications, to ensure fair and accurate outcomes for all patients.


An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets

Gulamali, Faris F., Sawant, Ashwin S., Liharska, Lora, Horowitz, Carol R., Chan, Lili, Kovatch, Patricia H., Hofer, Ira, Singh, Karandeep, Richardson, Lynne D., Mensah, Emmanuel, Charney, Alexander W, Reich, David L., Hu, Jianying, Nadkarni, Girish N.

arXiv.org Artificial Intelligence

The adoption of diagnosis and prognostic algorithms in healthcare has led to concerns about the perpetuation of bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success. Here, we generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity). We then apply a systematic analysis of AEq values across subpopulations to identify and mitigate manifestations of racial bias in two known cases in healthcare - Chest X-rays diagnosis with deep convolutional neural networks and healthcare utilization prediction with multivariate logistic regression. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.


Designing Equitable Algorithms

Chohlas-Wood, Alex, Coots, Madison, Goel, Sharad, Nyarko, Julian

arXiv.org Artificial Intelligence

Predictive algorithms are now used to help distribute a large share of our society's resources and sanctions, such as healthcare, loans, criminal detentions, and tax audits. Under the right circumstances, these algorithms can improve the efficiency and equity of decision-making. At the same time, there is a danger that the algorithms themselves could entrench and exacerbate disparities, particularly along racial, ethnic, and gender lines. To help ensure their fairness, many researchers suggest that algorithms be subject to at least one of three constraints: (1) no use of legally protected features, such as race, ethnicity, and gender; (2) equal rates of "positive" decisions across groups; and (3) equal error rates across groups. Here we show that these constraints, while intuitively appealing, often worsen outcomes for individuals in marginalized groups, and can even leave all groups worse off. The inherent trade-off we identify between formal fairness constraints and welfare improvements -- particularly for the marginalized -- highlights the need for a more robust discussion on what it means for an algorithm to be "fair". We illustrate these ideas with examples from healthcare and the criminal-legal system, and make several proposals to help practitioners design more equitable algorithms.


Health Care Bias Is Dangerous. But So Are 'Fairness' Algorithms

WIRED

Mental and physical health are crucial contributors to living happy and fulfilled lives. How we feel impacts the work we perform, the social relationships we forge, and the care we provide for our loved ones. Because the stakes are so high, people often turn to technology to help keep our communities safe. Artificial intelligence is one of the big hopes, and many companies are investing heavily in tech to serve growing health needs across the world. And many promising examples exist: AI can be used to detect cancer, triage patients, and make treatment recommendations.


Write It Like You See It: Detectable Differences in Clinical Notes By Race Lead To Differential Model Recommendations

Adam, Hammaad, Yang, Ming Ying, Cato, Kenrick, Baldini, Ioana, Senteio, Charles, Celi, Leo Anthony, Zeng, Jiaming, Singh, Moninder, Ghassemi, Marzyeh

arXiv.org Artificial Intelligence

Clinical notes are becoming an increasingly important data source for machine learning (ML) applications in healthcare. Prior research has shown that deploying ML models can perpetuate existing biases against racial minorities, as bias can be implicitly embedded in data. In this study, we investigate the level of implicit race information available to ML models and human experts and the implications of model-detectable differences in clinical notes. Our work makes three key contributions. First, we find that models can identify patient self-reported race from clinical notes even when the notes are stripped of explicit indicators of race. Second, we determine that human experts are not able to accurately predict patient race from the same redacted clinical notes. Finally, we demonstrate the potential harm of this implicit information in a simulation study, and show that models trained on these race-redacted clinical notes can still perpetuate existing biases in clinical treatment decisions.


Assessing Phenotype Definitions for Algorithmic Fairness

Sun, Tony Y., Bhave, Shreyas, Altosaar, Jaan, Elhadad, Noémie

arXiv.org Artificial Intelligence

Disease identification is a core, routine activity in observational health research. Cohorts impact downstream analyses, such as how a condition is characterized, how patient risk is defined, and what treatments are studied. It is thus critical to ensure that selected cohorts are representative of all patients, independently of their demographics or social determinants of health. While there are multiple potential sources of bias when constructing phenotype definitions which may affect their fairness, it is not standard in the field of phenotyping to consider the impact of different definitions across subgroups of patients. In this paper, we propose a set of best practices to assess the fairness of phenotype definitions. We leverage established fairness metrics commonly used in predictive models and relate them to commonly used epidemiological cohort description metrics. We describe an empirical study for Crohn's disease and diabetes type 2, each with multiple phenotype definitions taken from the literature across two sets of patient subgroups (gender and race). We show that the different phenotype definitions exhibit widely varying and disparate performance according to the different fairness metrics and subgroups. We hope that the proposed best practices can help in constructing fair and inclusive phenotype definitions.


Why it's a problem that pulse oximeters don't work as well on patients of color

#artificialintelligence

Pulse oximetry is a noninvasive test that measures the oxygen saturation level in a patient's blood, and it has become an important tool for monitoring many patients, including those with Covid-19. But new research links faulty readings from pulse oximeters with racial disparities in health outcomes, potentially leading to higher rates of death and complications such as organ dysfunction, in patients with darker skin. It is well known that non-white intensive care unit (ICU) patients receive less-accurate readings of their oxygen levels using pulse oximeters -- the common devices clamped on patients' fingers. Now, a paper co-authored by MIT scientists reveals that inaccurate pulse oximeter readings can lead to critically ill patients of color receiving less supplemental oxygen during ICU stays. The paper, "Assessment of Racial and Ethnic Differences in Oxygen Supplementation Among Patients in the Intensive Care Unit," published in JAMA Internal Medicine, focused on the question of whether there were differences in supplemental oxygen administration among patients of different races and ethnicities that were associated with pulse oximeter performance discrepancies.